Like Figure 18-7, every ROC graph has sensitivity running up the Y axis, which is displayed either as
fractions between 0 and 1 or as percentages between 0 and 100. The X axis is either presented from
left to right as
, or like it is in Figure 18-7, where specificity is labeled backwards —
from right to left — along the X axis.
Most ROC curves lie in the upper-left part of the graph area. The farther away from the
diagonal line they are, the better the predictive model is. For a nearly perfect model, the ROC
curve runs up along the Y axis from the lower-left corner to the upper-left corner, then along the
top of the graph from the upper-left corner to the upper-right corner.
Because of how sensitivity and specificity are calculated, the graph appears as a series of steps. If you
have a large data set, your graph will have more and smaller steps. For clarity, we show the cut values
for predicted probability as a scale along the ROC curve itself in Figure 18-7, but unfortunately, most
statistical software doesn’t do this for you.
Looking at the ROC curve helps you choose a cut value that gives the best tradeoff between
sensitivity and specificity:
To have very few false positives: Choose a higher cut value to give a high specificity. Figure 18-
7 shows that by setting the cut value to 0.6, you can simultaneously achieve about 93 percent
specificity and 87 percent sensitivity.
To have very few false negatives: Choose a lower cut value to give higher sensitivity. Figure 18-
7 shows you that if you set the cut value to 0.3, you can have almost perfect sensitivity because
you’ll be at almost 100 percent, but your specificity will be only about 75 percent, meaning you’ll
have a 25 percent false positive rate.
The software may optionally display the area under the ROC curve (abbreviated AUC), along with its
standard error and a p value. This is another measure of how good the predictive model is. The
diagonal line has an AUC of 0.5, and there is a statistical test comparing your AUC to the diagonal
line. Under α = 0.05, if the p value < 0.05, it indicates that your model is statistically significantly
better than the diagonal line at accurately predicting your outcome.
Heads Up: Knowing What Can Go Wrong with
Logistic Regression
Logistic regression presents many of the same potential pitfalls as ordinary least-squares regression
(see Chapters 16 and 17), as well as several that are specific to logistic regression. Watch out for
some of the more common pitfalls:
Don’t fit a logistic function to non-logistic data: Don’t use logistic regression to fit data that
doesn’t behave like the logistic S curve. Plot your grouped data (as shown earlier in Figure 18-